Introduction

Our goal for this project was to investigate the correlation between gas prices and Metro Transit ridership. Through visualisations and analysis, we wanted to help Metro Transit understand and prepare for changes in demand for their services. We started out with the research question,

“How do changes in gas prices affect metro transit ridership?”

This was our main research question for the project. After making some preliminary visualizations to investigate the structure and content of our data, we narrowed our question to;

“What is the correlation between gas prices and ridership in areas with different median incomes in the Twin Cities?”

We then created visualizations and an app in R Shiny to illustrate our findings on the correlation between gas price, Metro Transit ridership, and median income.

Data Collections

For this project, our data came from several different sources.

We started with the Metro Transit data provided by Eric Lind to find information on daily ridership, and supplemented that with with the Nextrip API to locate each bus route.

For Minnesota gas pricing, we used data from the Energy Information Administration. We used data for the entire state of Minnesota, both because that was the most reliable data we were able to find and because those prices are still very representative of the twin cities. From these sources, we were able to create two scatterplot of historic Minnesota gas prices, one without ridership and one including bus ridership trends.

We used data from the Chicago Transit Authority to find information Chicago bus ridership. With this, we were able to create another scatterplot with historic Chicago gas prices and and bus ridership to compare to the Minnesota graphs.

Lastly, for information on median income in the Twin Cities, we used data on census tracts from the U.S. Census Bureau. With this information, we were able to create the RShiny App which compare median incomes, gas prices, and bus ridership in the twin cities all in one platform. From this, we were able to find many correlation values between gas price, bus ridership, and income, which we were able to plot on two different visualizations (one as a density plot and one as a scatterplot).

Setting Up

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(readxl)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

Data Frames

Frame 1: Minnesota Gas Prices

Minnesota Regular Retail Gas Prices Data Frame

library(readxl)
GasPrices_Regular <- read_excel("~/METRO TRANSIT/Gas1.xlsx")
colnames(GasPrices_Regular) <- c("Date", "Price")
GasPrices_Regular <- GasPrices_Regular[-c(1, 2),]
GasPrices_Regular$Date <- as.numeric(GasPrices_Regular$Date)
GasPrices_Regular$Date <- as.Date(GasPrices_Regular$Date, origin="1899-12-30")

GasPrices_Regular

Frame 2: Minnesota Ridership and Gas Prices

Metro Transit Ridership Data Joined to Minnesota Gas Prices Data Frame

metro_gas <- read_csv("~/METRO TRANSIT/metro_gas.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   dtDate = col_datetime(format = ""),
##   Schedule = col_character(),
##   Route = col_integer(),
##   RouteType = col_character(),
##   SumOfTrips = col_integer(),
##   Local_Non_Rush = col_integer(),
##   Local_Rush = col_integer(),
##   Express_Non_Rush = col_integer(),
##   Express_Rush = col_integer(),
##   Total_Riders = col_double(),
##   Date = col_date(format = ""),
##   Price = col_double()
## )
metro_gas

Frame 3: Chicago Gas Prices

Chicago Regular Retail Gas Prices Data Frame

chicago_prices <- read_csv("~/METRO TRANSIT/Chicago1.csv", 
    col_types = cols(Date = col_date(format = "%m/%d/%Y")))

chicago_prices

Frame 4: Chicago Ridership and Gas Prices

Chicago Transit Authority Ridership Data Joined to Chicago Gas Prices Data Frame

chicago <- read_csv("~/METRO TRANSIT/CTA_-_Ridership_-_Bus_Routes_-_Daily_Totals_by_Route.csv", 
    col_types = cols(date = col_date(format = "%m/%d/%Y")))

colnames(chicago) <- c("Route", "Date","RouteType","Total_Riders")

chicago_gas <- chicago %>%
  inner_join(chicago_prices)
## Joining, by = "Date"
chicago_gas

Frame 5: Median Income by Count

Getting ACS Data for Median Income of Citizens by County in Minnesota

mini <- acs <- read_csv("~/METRO TRANSIT/acs.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   X1 = col_integer()
## )
## See spec(...) for full column specifications.
mini

Frame 6: Correlation by Route and Median Income

Getting Correlation Values between Gas Price and Ridership by Route for Minnesota

#Example given for route 7
metro_cor <- metro_gas %>% 
  select(Route, Price, Total_Riders) %>%
  filter(Route==7)

metro_cor$Price <- as.numeric(metro_cor$Price)

metro_cor$Total_Riders <- as.numeric(metro_cor$Total_Riders)

cor(metro_cor$Price, metro_cor$Total_Riders)
## [1] 0.4498628
#0.4498628

#Calculate Correlations for each route manually and combine into an excel table. Combine values in Excel Table containing Routes, then import

correlations <- read_excel("~/METRO TRANSIT/correlations.xlsx")
correlations

Visualizations

Figure 1: Minnesota Gas Prices

Gas Price Over Time For Minnesota

ggplot(metro_gas, aes(x=Date, y=as.numeric(Price))) +
  geom_jitter(color="#D9A6EC") +
  theme_minimal() +
  labs(title="Gas Price Over Time", x="Date", y="Gas Price", caption = "Source: Eric's Metro Transit")

For Minneapolis, the gas prices flactuated seasonally and seemed to stabilize to a middle value ($2.0 per gallon) around 2016. The highest price was around 3.5 dollars per gallon and the lowest was at 1.5 dollars per gallon.

Figure 2: Minnesota Gas Prices and Ridership

Gas Price and Ridership Over Time for Minnesota

ggplot(metro_gas, aes(x=Date, y=as.numeric(Price), color=Total_Riders)) +
  geom_jitter() +
  theme_minimal() +
  scale_color_gradient(low = "#E3DE8F", high = "#D9A6EC", name="Total Riders") +
  labs(title="Gas Price and Ridership Over Time", x="Date", y="Gas Price", caption = "Source: Eric's Metro Transit")

The graph above shows how both gas prices and ridership changed over the years. The ridership had been quite high before 2016 regardless of flactuations in gas prices since the color stayed relatively purple. Once the prices stabilized slightly after 2016, the ridership decreased as the points became less purple and more cream colored.

Figure 3: Chicago Gas Prices and Ridership

Plot of Gas Prices and Ridership over Time For Chicago

chicago_metro  <- chicago_gas %>% 
  group_by(Date,price,Total_Riders) %>%
  filter(Total_Riders < 20000)

#Chicago Data set has more dates with fewer riders (close to 1000) than others and 1 or 2 dates with 40,000 riders which caused a huge disparity in the scale and made it difficult to interpret the graph if there wasn't a limit to the total riders

ggplot(chicago_metro, aes(x=Date, y=price, color=Total_Riders)) +
  geom_point()+
  scale_x_date(date_breaks = "6 month",
                 limits = as.Date(c('2014-06-01','2017-11-09'))) +
  theme_minimal() +
  scale_color_gradient(low = "#E3DE8F", high = "#D9A6EC", "Total Riders") +
  labs(title="Chicago Gas Price & Ridership Over Time (2014 to 2017)", x="Date", y="Gas Price", caption = "Source: CTA API")+ 
  theme(axis.text.x = element_text(face="bold", size=10, angle=45))
## Warning: Removed 88314 rows containing missing values (geom_point).

Chicago had a similar variation in gas prices. They were seasonal and had generally stabilized to a middle value ($2.5 per gallon) around 2016. However, in Chicago, the number of riders was generally unchanging and only had spikes on distinct days. It was not that subject to changes in gas prices and this may be because they also had trains and bicycles that were very popular and may have competed with buses for riders.

Figure 4: Shiny App

Shiny App for Routes, Counties and Metro Transit Garages in the Twin Cities

Figure Showing Layout of Shiny App

Figure Showing Layout of Shiny App

Shiny App Ui

library(shinydashboard)
library(leaflet)
library(readr)
library(ggvis)
library(shinythemes)

metro_gas <- read_csv("C:/Users/Macalester User/Documents/metro_gas.csv")
trips  <- readRDS("C:/Users/Macalester User/Desktop/MyApp/trips.rds")
shapes <- readRDS("C:/Users/Macalester User/Desktop/MyApp/shapes.rds")


header <- dashboardHeader(
  title = "Twin Cities Routes"
)

body <- dashboardBody(skin = "green",
  fluidRow(
    column(width = 6,
           box(width = NULL, solidHeader = TRUE,
               leafletOutput("busmap", height = 500)
           )
    ),
    
    column(width = 6,
           box(width = NULL, status = "warning",
               uiOutput("routeSelect")
           ),
           box( width = NULL,
                ggvisOutput("cocaine_state")
                
    )
  )

)
)

fluidPage(
  
  tags$head(
    tags$style(HTML("
                    @import url('//fonts.googleapis.com/css?family=Lobster|Cabin:400,700');
                    "))
    ),
  
dashboardPage(skin="red",
  header,
  dashboardSidebar(disable = TRUE),
  body
)
)

Shiny App Server

library(shinydashboard)
library(acs)
library(leaflet)
library(dplyr)
library(readr)
library(curl) # make the jsonlite suggested dependency explicit
library(tigris)
library(htmltools)


# http://svc.metrotransit.org/NexTrip/help
getMetroData <- function(path) {
  url <- paste0("http://svc.metrotransit.org/NexTrip/", path, "?format=json")
  jsonlite::fromJSON(url)}

metro_gas <- read_csv("metro_gas_data.csv")
trips  <- readRDS("trips.rds")
shapes <- readRDS("shapes.rds")
houses <- read_csv("garage.csv")
stops <- read_csv("bus-stops-Oct-2017.csv")
mini <- read_csv("acs.csv")


census_tracts <- tracts(state = "MN", county = c("Anoka County","Carver County","Dakota County","Chisago County","Hennepin County","Ramsey County","Washington County","Sibley County","Scott County","Sherburne County","Wright County","Pierce County"," Mille Lacs County","Isanti County","Le Sueur County","St. Croix County"))

data_geo <- geo_join(census_tracts, mini, by_sp = "GEOID", by_df = "GEO.id2", how = "inner")

get_route_shape <- function(route) {
  routeid <- paste0(route, "-75")
  
  shape_counts <- trips %>%
    filter(route_id == routeid) %>%
    group_by(shape_id) %>%
    summarise(n = n()) %>%
    arrange(-n)
  
  shapeid <- shape_counts$shape_id[1]

  shapes %>% filter(shape_id == shapeid)
}


function(input, output, session) {
  

  output$routeSelect <- renderUI({
    live_vehicles <- getMetroData("VehicleLocations/0")
    
    routeNums <- sort(unique(as.numeric(live_vehicles$Route)))
    names(routeNums) <- routeNums
    routeNums <- c(All = 0, routeNums)
    selectInput("routeNum", "Route", choices = routeNums, selected = routeNums[2])
  })
  
  bus <- makeIcon(
    iconUrl = "bus.png",
    iconWidth = 30, iconHeight = 30
  )
  
  
  output$busmap <- renderLeaflet({
    
    
    data_geo$HC02_EST_VC02 <- as.numeric(data_geo$HC02_EST_VC02)
    
    data_geo$HC02_EST_VC02[is.na(data_geo$HC02_EST_VC02)] <- 0
    
    bins <- c(0, 1.18e+04,4.23e+04, 7.27e+04, 1.03e+05, 1.33e+05, 1.64e+05, 1.94e+05, Inf)
    
    pal <- colorBin("BuPu", domain = data_geo$HC02_EST_VC02, bins = bins)
    
    
    labels <- sprintf(
      "<strong>%s</strong><br/>%g Median Income<sup>2</sup>",
      data_geo$GEO.display.label, data_geo$HC02_EST_VC02
    ) %>% lapply(htmltools::HTML)
    
    
    map <-  leaflet(data = data_geo) %>% addTiles() %>%
      addTiles() %>%
      addMiniMap(position="topright") %>%
      addProviderTiles(providers$CartoDB.Positron) %>%
      addMarkers(data=houses, icon=bus, ~long, ~lat, popup= ~as.character(garage),label= ~as.character(garage)) %>%
      addPolygons(data=data_geo,
                  fillColor = ~pal(HC02_EST_VC02),
                  weight = 1,
                  opacity = 0.5,
                  color = "white",
                  dashArray = "3",
                  fillOpacity = 0.5,
                  highlight = highlightOptions(
                    weight = 3,
                    color = "#666",
                    dashArray = "",
                    fillOpacity = 0.7,
                    bringToFront = TRUE),
                  label = labels,
                  labelOptions = labelOptions(
                    style = list("font-weight" = "normal", padding = "3px 8px"),
                    textsize = "15px",
                    direction = "auto")) %>%
      addLegend(pal = pal, values = ~density, opacity = 0.7,
                position = "bottomright", title = "Median Income (2015)")
    
    
    if (as.numeric(input$routeNum) != 0) {
      route_shape <- get_route_shape(input$routeNum)
      
      map <- addPolylines(map,
                          route_shape$shape_pt_lon,
                          route_shape$shape_pt_lat,
                          fill = FALSE,
                          color="black",
                          smoothFactor = 3,
                          
      )
    }
    
    
    
    map
  })
  
  rct_cocaine <- reactive({
    metro_gas %>% filter((metro_gas$Route %in% input$routeNum))  
  })
  
  # works OK
  rct_cocaine %>%
    ggvis(~Price, ~Total_Riders) %>%
    add_axis("x", title= "Gas Price (U.S. Dollars)") %>%
  add_axis("y", title = "Total Ridership", title_offset = 50) %>%
    layer_points(fill = ~Route, size := 50, opacity := 0.5, fill := "purple") %>%
    layer_model_predictions(model="lm") %>%
    bind_shiny("cocaine_state") 
}

We used the shiny app to filter out and display each route individually. The app also plotted how each route’s ridership changes as gas prices changed. We also used the app to find the individual correlations between ridership and gas prices by route as values. The app also showed what counties the routes passed through and what the median income of the people living in those counties were. Lastly, the app told us where the Metro Transit Garages were located. This helped us know which garage served specific routes.

Figure 5: Density Plot

Density Plot of Correlations Between Ridership and Gas Prices for Routes

ggplot(correlations, aes(x=correlation)) +
  geom_density(fill="#BE95C4", color="#D9A6EC") +
  theme_minimal() +
  labs(title="Correlations between Gas Price and Ridership by Route", x="Correlation", y="Frequency", caption = "Source:EIA & ACS")

This Density plot shows the frequency of the correlations between ridership and prices. These were plotted using the individual correlations of each route from the ‘correlations’ data frame. The data frame was made using the shiny app.

Figure 6: Combined Plot

Dot Plot of Correlations Between Ridership and Gas Prices Against Median Income For Minnesota

library(ggplot2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
pMain <- ggplot(correlations, aes(x = median_income, y =correlation )) +
         geom_point(color="#BE95C4") +
       theme_minimal() +
  geom_smooth(method=lm, color="#5E548E") +
  labs( x="Median Income", y="Correlation", title="Correlations between Ridership and Price Vs Median Income", caption = "Source : ACS & EIA Data") +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold"))

pRight <- ggplot(correlations, aes(x = correlation)) +
        geom_histogram(fill="#BE95C4") + coord_flip() + theme_minimal() +
  labs(x="",y="") +
  theme(,axis.text.x = element_blank())

pEmpty <- ggplot(correlations, aes(x = median_income, y = correlation)) +
          geom_blank() +
          theme(axis.text = element_blank(),
                axis.title = element_blank(),
                line = element_blank(),
                panel.background = element_blank()) 

pTop <- ggplot(correlations, aes(x = median_income, y = correlation)) +
          geom_blank() +
          theme(axis.text = element_blank(),
                axis.title = element_blank(),
                line = element_blank(),
                panel.background = element_blank()) 

grid.arrange(pTop, pEmpty, pMain, pRight,
             ncol = 2, nrow = 2, widths = c(3, 1), heights = c(1, 3))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The above graph is similar to the density plot but also shows how the correlations are affected by the median income. The density plot from before is displayed as a vertical histogram on the right. It was also made using the ‘correlations’ data frame and the shiny app.

Analysis

  • Figure 1 shows gas prices in Minnesota on a weekly basis from 2014 to 2017. While there has been a general decrease in gas prices over the years, it is clear that gas prices fluctuate on a seasonal basis, with gas prices at their lowest around fall and winter, and at their highest around summer and spring.

  • Figure 2 is an analysis of gas price and Metro Transit ridership over time, which examines trends in Minnesota. It shows that the highest values for ridership are around the spring and summer which is also the time that gas prices are at their highest, indicating a somewhat positive correlation between gas price and Metro Transit ridership.

  • Figure 3 is an analysis of gas price and ridership over time in Chicago, in order to show a comparison between two different areas. It shows that the seasonal trends seen in gas price vs time in Minnesota hold true in other areas (such as Chicago). However, in Chicago there is no discernable trend between ridership and gas prices over time. It should be noted that the Chicago data does not include its popular train transit (only buses) and that Chicago has significant bike usership and infrastructure, which may also cause the differences in data.

  • Figure 5 is an analysis of how frequently correlations (examining gas price vs ridership) of differing values exist among the routes examined. It is clear that the most common correlations exist show between a 0 and .25 correlation strength, as indicated by a bell curve. Most correlations are positive to some extent, indicating that there is a general positive correlation between gas price and Metro Transit ridership in the Twin Cities area.

  • Figure 6 examines how much of a positive correlation (examining gas price vs ridership) exists in areas depending on the median income of the areas. This data shows that the higher the median income of an area, the more likely it is to display a higher positive correlation between gas price and ridership. There are a cluster of outliers for routes in very high income areas, which show low amounts of positive correlation (compared to other high income areas).

Key Findings

  1. Gas Prices and Ridership at their highest around spring and summer, lowest around winter and fall.
  2. Most areas in the Twin Cities exhibit a slightly positive correlation between Metro Transit ridership and gas prices.
  3. Higher income areas are likely to exhibit greater positive correlation between Metro transit ridership and gas prices.

Take Aways for Metro Transit

  1. Demand for public transit is likely to increase in spring and summer, and decrease in winter and fall.
  2. Areas with higher incomes are likely to experience the greatest increases in demand for Metro Transit buses in the event of gas prices increasing.